45 research outputs found

    "Learn the Facts About COVID-19": Analyzing the Use of Warning Labels on TikTok Videos

    Get PDF
    During the COVID-19 pandemic, health-related misinformation and harmful content shared online had a significant adverse effect on society. To mitigate this adverse effect, mainstream social media platforms employed soft moderation interventions (i.e., warning labels) on potentially harmful posts. Despite the recent popularity of these moderation interventions, we lack empirical analyses aiming to uncover how these warning labels are used in the wild, particularly during challenging times like the COVID-19 pandemic. In this work, we analyze the use of warning labels on TikTok, focusing on COVID-19 videos. First, we construct a set of 26 COVID-19 related hashtags, then we collect 41K videos that include those hashtags in their description. Second, we perform a quantitative analysis on the entire dataset to understand the use of warning labels on TikTok. Then, we perform an in-depth qualitative study, using thematic analysis, on 222 COVID-19 related videos to assess the content and the connection between the content and the warning labels. Our analysis shows that TikTok broadly applies warning labels on TikTok videos, likely based on hashtags included in the description. More worrying is the addition of COVID-19 warning labels on videos where their actual content is not related to COVID-19 (23% of the cases in a sample of 143 English videos that are not related to COVID-19). Finally, our qualitative analysis on a sample of 222 videos shows that 7.7% of the videos share misinformation/harmful content and do not include warning labels, 37.3% share benign information and include warning labels, and that 35% of the videos that share misinformation/harmful content (and need a warning label) are made for fun. Our study demonstrates the need to develop more accurate and precise soft moderation systems, especially on a platform like TikTok that is extremely popular among people of younger age

    Understanding and Detecting Hateful Content using Contrastive Learning

    Get PDF
    The spread of hate speech and hateful imagery on the Web is a significant problem that needs to be mitigated to improve our Web experience. This work contributes to research efforts to detect and understand hateful content on the Web by undertaking a multimodal analysis of Antisemitism and Islamophobia on 4chan's /pol/ using OpenAI's CLIP. This large pre-trained model uses the Contrastive Learning paradigm. We devise a methodology to identify a set of Antisemitic and Islamophobic hateful textual phrases using Google's Perspective API and manual annotations. Then, we use OpenAI's CLIP to identify images that are highly similar to our Antisemitic/Islamophobic textual phrases. By running our methodology on a dataset that includes 66M posts and 5.8M images shared on 4chan's /pol/ for 18 months, we detect 573,513 posts containing 92K Antisemitic/Islamophobic images and 246K posts that include 420 hateful phrases. Among other things, we find that we can use OpenAI's CLIP model to detect hateful content with an accuracy score of 0.84 (F1 score = 0.58). Also, we find that Antisemitic/Islamophobic imagery is shared in 2x more posts on 4chan's /pol/ compared to Antisemitic/Islamophobic textual phrases, highlighting the need to design more tools for detecting hateful imagery. Finally, we make publicly available a dataset of 420 Antisemitic/Islamophobic phrases and 92K images that can assist researchers in further understanding Antisemitism/Islamophobia and developing more accurate hate speech detection models

    ``Learn the Facts About {COVID-19}'': {A}nalyzing the Use of Warning Labels on {TikTok} Videos

    Get PDF
    During the COVID-19 pandemic, health-related misinformation and harmfulcontent shared online had a significant adverse effect on society. To mitigatethis adverse effect, mainstream social media platforms employed soft moderationinterventions (i.e., warning labels) on potentially harmful posts. Despite therecent popularity of these moderation interventions, we lack empirical analysesaiming to uncover how these warning labels are used in the wild, particularlyduring challenging times like the COVID-19 pandemic. In this work, we analyzethe use of warning labels on TikTok, focusing on COVID-19 videos. First, weconstruct a set of 26 COVID-19 related hashtags, then we collect 41K videosthat include those hashtags in their description. Second, we perform aquantitative analysis on the entire dataset to understand the use of warninglabels on TikTok. Then, we perform an in-depth qualitative study, usingthematic analysis, on 222 COVID-19 related videos to assess the content and theconnection between the content and the warning labels. Our analysis shows thatTikTok broadly applies warning labels on TikTok videos, likely based onhashtags included in the description. More worrying is the addition of COVID-19warning labels on videos where their actual content is not related to COVID-19(23% of the cases in a sample of 143 English videos that are not related toCOVID-19). Finally, our qualitative analysis on a sample of 222 videos showsthat 7.7% of the videos share misinformation/harmful content and do not includewarning labels, 37.3% share benign information and include warning labels, andthat 35% of the videos that share misinformation/harmful content (and need awarning label) are made for fun. Our study demonstrates the need to developmore accurate and precise soft moderation systems, especially on a platformlike TikTok that is extremely popular among people of younger age.<br

    On the Globalization of the {QAnon} Conspiracy Theory Through {Telegram}

    Get PDF
    QAnon is a far-right conspiracy theory that became popular and mainstreamover the past few years. Worryingly, the QAnon conspiracy theory hasimplications in the real world, with supporters of the theory participating inreal-world violent acts like the US capitol attack in 2021. At the same time,the QAnon theory started evolving into a global phenomenon by attractingfollowers across the globe and, in particular, in Europe. Therefore, it isimperative to understand how the QAnon theory became a worldwide phenomenon andhow this dissemination has been happening in the online space. This paperperforms a large-scale data analysis of QAnon through Telegram by collecting4.5M messages posted in 161 QAnon groups/channels. Using Google's PerspectiveAPI, we analyze the toxicity of QAnon content across languages and over time.Also, using a BERT-based topic modeling approach, we analyze the QAnondiscourse across multiple languages. Among other things, we find that theGerman language is prevalent in QAnon groups/channels on Telegram, evenovershadowing English after 2020. Also, we find that content posted in Germanand Portuguese tends to be more toxic compared to English. Our topic modelingindicates that QAnon supporters discuss various topics of interest withinfar-right movements, including world politics, conspiracy theories, COVID-19,and the anti-vaccination movement. Taken all together, we perform the firstmultilingual study on QAnon through Telegram and paint a nuanced overview ofthe globalization of the QAnon theory.<br

    Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board

    Get PDF
    This paper presents a dataset with over 3.3M threads and 134.5M posts from the Politically Incorrect board (/pol/) of the imageboard forum 4chan, posted over a period of almost 3.5 years (June 2016-November 2019). To the best of our knowledge, this represents the largest publicly available 4chan dataset, providing the community with an archive of posts that have been permanently deleted from 4chan and are otherwise inaccessible. We augment the data with a set of additional labels, including toxicity scores and the named entities mentioned in each post. We also present a statistical analysis of the dataset, providing an overview of what researchers interested in using it can expect, as well as a simple content analysis, shedding light on the most prominent discussion topics, the most popular entities mentioned, and the toxicity level of each post. Overall, we are confident that our work will motivate and assist researchers in studying and understanding 4chan, as well as its role on the greater Web. For instance, we hope this dataset may be used for cross-platform studies of social media, as well as being useful for other types of research like natural language processing. Finally, our dataset can assist qualitative work focusing on in-depth case studies of specific narratives, events, or social theories

    What Do Fact Checkers Fact-check When?

    Get PDF
    Recent research suggests that not all fact checking efforts are equal: when and what is fact checked plays a pivotal role in effectively correcting misconceptions. In this paper, we propose a framework to study fact checking efforts using Google Trends, a signal that captures search interest over topics on the world's largest search engine. Our framework consists of extracting claims from fact checking efforts, linking such claims with knowledge graph entities, and estimating the online attention they receive. We use this framework to study a dataset of 879 COVID-19-related fact checks done in 2020 by 81 international organizations. Our findings suggest that there is often a disconnect between online attention and fact checking efforts. For example, in around 40% of countries where 10 or more claims were fact checked, half or more than half of the top 10 most popular claims were not fact checked. Our analysis also shows that claims are first fact checked after receiving, on average, 35% of the total online attention they would eventually receive in 2020. Yet, there is a big variation among claims: some were fact checked before receiving a surge of misinformation-induced online attention, others are fact checked much later. Overall, our work suggests that the incorporation of online attention signals may help organizations better assess and prioritize their fact checking efforts. Also, in the context of international collaboration, where claims are fact checked multiple times across different countries, online attention could help organizations keep track of which claims are "migrating" between different countries

    Soros, Child Sacrifices, and {5G}: {U}nderstanding the Spread of Conspiracy Theories on {Web} Communities

    Get PDF
    This paper presents a multi-platform computational pipeline geared to identify social media posts discussing (known) conspiracy theories. We use 189 conspiracy claims collected by Snopes, and find 66k posts and 277k comments on Reddit, and 379k tweets discussing them. Then, we study how conspiracies are discussed on different Web communities and which ones are particularly influential in driving the discussion about them. Our analysis sheds light on how conspiracy theories are discussed and spread online, while highlighting multiple challenges in mitigating them

    "Is it a {Qoincidence}?": {A} First Step Towards Understanding and Characterizing the {QAnon} Movement on {Voat.co}

    Get PDF
    Online fringe communities offer fertile grounds for users to seek and share paranoid ideas fueling suspicion of mainstream news, and outright conspiracy theories. Among these, the QAnon conspiracy theory has emerged in 2017 on 4chan, broadly supporting the idea that powerful politicians, aristocrats, and celebrities are closely engaged in a global pedophile ring. At the same time, governments are thought to be controlled by "puppet masters," as democratically elected officials serve as a fake showroom of democracy. In this paper, we provide an empirical exploratory analysis of the QAnon community on Voat.co, a Reddit-esque news aggregator, which has recently captured the interest of the press for its toxicity and for providing a platform to QAnon followers. More precisely, we analyze a large dataset from /v/GreatAwakening, the most popular QAnon-related subverse (the Voat equivalent of a subreddit) to characterize activity and user engagement. To further understand the discourse around QAnon, we study the most popular named entities mentioned in the posts, along with the most prominent topics of discussion, which focus on US politics, Donald Trump, and world events. We also use word2vec models to identify narratives around QAnon-specific keywords, and our graph visualization shows that some of QAnon-related ones are closely related to those from the Pizzagate conspiracy theory and "drops" by "Q." Finally, we analyze content toxicity, finding that discussions on /v/GreatAwakening are less toxic than in the broad Voat community

    "It is just a flu": {A}ssessing the Effect of Watch History on {YouTube}'s Pseudoscientific Video Recommendations

    Get PDF
    YouTube has revolutionized the way people discover and consume videos, becoming one of the primary news sources for Internet users. Since content on YouTube is generated by its users, the platform is particularly vulnerable to misinformative and conspiratorial videos. Even worse, the role played by YouTube's recommendation algorithm in unwittingly promoting questionable content is not well understood, and could potentially make the problem even worse. This can have dire real-world consequences, especially when pseudoscientific content is promoted to users at critical times, e.g., during the COVID-19 pandemic. In this paper, we set out to characterize and detect pseudoscientific misinformation on YouTube. We collect 6.6K videos related to COVID-19, the flat earth theory, the anti-vaccination, and anti-mask movements; using crowdsourcing, we annotate them as pseudoscience, legitimate science, or irrelevant. We then train a deep learning classifier to detect pseudoscientific videos with an accuracy of 76.1%. Next, we quantify user exposure to this content on various parts of the platform (i.e., a user's homepage, recommended videos while watching a specific video, or search results) and how this exposure changes based on the user's watch history. We find that YouTube's recommendation algorithm is more aggressive in suggesting pseudoscientific content when users are searching for specific topics, while these recommendations are less common on a user's homepage or when actively watching pseudoscientific videos. Finally, we shed light on how a user's watch history substantially affects the type of recommended videos

    Understanding the Effect of Deplatforming on Social Networks

    Get PDF
    Aiming to enhance the safety of their users, social media platforms enforce terms of service by performing active moderation, including removing content or suspending users. Nevertheless, we do not have a clear understanding of how effective it is, ultimately, to suspend users who engage in toxic behavior, as that might actually draw users to alternative platforms where moderation is laxer. Moreover, this deplatforming efforts might end up nudging abusive users towards more extreme ideologies and potential radicalization risks. In this paper, we set to understand what happens when users get suspended on a social platform and move to an alternative one. We focus on accounts active on Gab that were suspended from Twitter and Reddit. We develop a method to identify accounts belonging to the same person on these platforms, and observe whether there was a measurable difference in the activity and toxicity of these accounts after suspension. We find that users who get banned on Twitter/Reddit exhibit an increased level of activity and toxicity on Gab, although the audience they potentially reach decreases. Overall, we argue that moderation efforts should go beyond ensuring the safety of users on a single platform, taking into account the potential adverse effects of banning users on major platforms
    corecore